50 research outputs found

    An O(nlog⁑(n))O(n\log(n)) Algorithm for Projecting Onto the Ordered Weighted β„“1\ell_1 Norm Ball

    Full text link
    The ordered weighted β„“1\ell_1 (OWL) norm is a newly developed generalization of the Octogonal Shrinkage and Clustering Algorithm for Regression (OSCAR) norm. This norm has desirable statistical properties and can be used to perform simultaneous clustering and regression. In this paper, we show how to compute the projection of an nn-dimensional vector onto the OWL norm ball in O(nlog⁑(n))O(n\log(n)) operations. In addition, we illustrate the performance of our algorithm on a synthetic regression test.Comment: 1 Figures, 1 table, 14 pages, Example added to appendi

    Convergence rate analysis of primal-dual splitting schemes

    Full text link
    Primal-dual splitting schemes are a class of powerful algorithms that solve complicated monotone inclusions and convex optimization problems that are built from many simpler pieces. They decompose problems that are built from sums, linear compositions, and infimal convolutions of simple functions so that each simple term is processed individually via proximal mappings, gradient mappings, and multiplications by the linear maps. This leads to easily implementable and highly parallelizable or distributed algorithms, which often obtain nearly state-of-the-art performance. In this paper, we analyze a monotone inclusion problem that captures a large class of primal-dual splittings as a special case. We introduce a unifying scheme and use some abstract analysis of the algorithm to prove convergence rates of the proximal point algorithm, forward-backward splitting, Peaceman-Rachford splitting, and forward-backward-forward splitting applied to the model problem. Our ergodic convergence rates are deduced under variable metrics, stepsizes, and relaxation. Our nonergodic convergence rates are the first shown in the literature. Finally, we apply our results to a large class of primal-dual algorithms that are a special case of our scheme and deduce their convergence rates.Comment: 31 pages, 1 table

    The Asynchronous PALM Algorithm for Nonsmooth Nonconvex Problems

    Full text link
    We introduce the Asynchronous PALM algorithm, a new extension of the Proximal Alternating Linearized Minimization (PALM) algorithm for solving nonsmooth, nonconvex optimization problems. Like the PALM algorithm, each step of the Asynchronous PALM algorithm updates a single block of coordinates; but unlike the PALM algorithm, the Asynchronous PALM algorithm eliminates the need for sequential updates that occur one after the other. Instead, our new algorithm allows each of the coordinate blocks to be updated asynchronously and in any order, which means that any number of computing cores can compute updates in parallel without synchronizing their computations. In practice, this asynchronization strategy often leads to speedups that increase linearly with the number of computing cores. We introduce two variants of the Asynchronous PALM algorithm, one stochastic and one deterministic. In the stochastic \textit{and} deterministic cases, we show that cluster points of the algorithm are stationary points. In the deterministic case, we show that the algorithm converges globally whenever the Kurdyka-{\L}ojasiewicz property holds for a function closely related to the objective function, and we derive its convergence rate in a common special case. Finally, we provide a concrete case in which our assumptions hold

    Graphical Convergence of Subgradients in Nonconvex Optimization and Learning

    Full text link
    We investigate the stochastic optimization problem of minimizing population risk, where the loss defining the risk is assumed to be weakly convex. Compositions of Lipschitz convex functions with smooth maps are the primary examples of such losses. We analyze the estimation quality of such nonsmooth and nonconvex problems by their sample average approximations. Our main results establish dimension-dependent rates on subgradient estimation in full generality and dimension-independent rates when the loss is a generalized linear model. As an application of the developed techniques, we analyze the nonsmooth landscape of a robust nonlinear regression problem.Comment: 36 page

    Factorial and Noetherian Subrings of Power Series Rings

    Full text link
    Let FF be a field. We show that certain subrings contained between the polynomial ring F[X]=F[X1,...,Xn]F[X] = F[X_1, ..., X_n] and the power series ring F[X][[Y]]=F[X1,...,Xn][[Y]]F[X][[Y]] = F[X_1, ..., X_n][[Y]] have Weierstrass Factorization, which allows us to deduce both unique factorization and the Noetherian property. These intermediate subrings are obtained from elements of F[X][[Y]]F[X][[Y]] by bounding their total XX-degree above by a positive real-valued monotonic up function Ξ»\lambda on their YY-degree. These rings arise naturally in studying pp-adic analytic variation of zeta functions over finite fields. Future research into this area may study more complicated subrings in which Y=(Y1,>...,Ym)Y = (Y_1, >..., Y_m) has more than one variable, and for which there are multiple degree functions, Ξ»1,...,Ξ»m\lambda_1, ..., \lambda_m. Another direction of study would be to generalize these results to kk-affinoid algebras.Comment: 13 page

    Stochastic model-based minimization of weakly convex functions

    Full text link
    We consider a family of algorithms that successively sample and minimize simple stochastic models of the objective function. We show that under reasonable conditions on approximation quality and regularity of the models, any such algorithm drives a natural stationarity measure to zero at the rate O(kβˆ’1/4)O(k^{-1/4}). As a consequence, we obtain the first complexity guarantees for the stochastic proximal point, proximal subgradient, and regularized Gauss-Newton methods for minimizing compositions of convex functions with smooth maps. The guiding principle, underlying the complexity guarantees, is that all algorithms under consideration can be interpreted as approximate descent methods on an implicit smoothing of the problem, given by the Moreau envelope. Specializing to classical circumstances, we obtain the long-sought convergence rate of the stochastic projected gradient method, without batching, for minimizing a smooth function on a closed convex set.Comment: 33 pages, 4 figure

    Faster convergence rates of relaxed Peaceman-Rachford and ADMM under regularity assumptions

    Full text link
    Splitting schemes are a class of powerful algorithms that solve complicated monotone inclusion and convex optimization problems that are built from many simpler pieces. They give rise to algorithms in which the simple pieces of the decomposition are processed individually. This leads to easily implementable and highly parallelizable algorithms, which often obtain nearly state-of-the-art performance. In this paper, we provide a comprehensive convergence rate analysis of the Douglas-Rachford splitting (DRS), Peaceman-Rachford splitting (PRS), and alternating direction method of multipliers (ADMM) algorithms under various regularity assumptions including strong convexity, Lipschitz differentiability, and bounded linear regularity. The main consequence of this work is that relaxed PRS and ADMM automatically adapt to the regularity of the problem and achieve convergence rates that improve upon the (tight) worst-case rates that hold in the absence of such regularity. All of the results are obtained using simple techniques.Comment: 40 pages, 3 table

    A Three-Operator Splitting Scheme and its Optimization Applications

    Full text link
    Operator splitting schemes have been successfully used in computational sciences to reduce complex problems into a series of simpler subproblems. Since 1950s, these schemes have been widely used to solve problems in PDE and control. Recently, large-scale optimization problems in machine learning, signal processing, and imaging have created a resurgence of interest in operator-splitting based algorithms because they often have simple descriptions, are easy to code, and have (nearly) state-of-the-art performance for large-scale optimization problems. Although operator splitting techniques were introduced over 60 years ago, their importance has significantly increased in the past decade. This paper introduces a new operator-splitting scheme for solving a variety of problems that are reduced to a monotone inclusion of three operators, one of which is cocoercive. Our scheme is very simple, and it does not reduce to any existing splitting schemes. Our scheme recovers the existing forward-backward, Douglas-Rachford, and forward-Douglas-Rachford splitting schemes as special cases. Our new splitting scheme leads to a set of new and simple algorithms for a variety of other problems, including the 3-set split feasibility problems, 3-objective minimization problems, and doubly and multiple regularization problems, as well as the simplest extension of the classic ADMM from 2 to 3 blocks of variables. In addition to the basic scheme, we introduce several modifications and enhancements that can improve the convergence rate in practice, including an acceleration that achieves the optimal rate of convergence for strongly monotone inclusions. Finally, we evaluate the algorithm on several applications.Comment: 52 pages, 5 figure

    Proximally Guided Stochastic Subgradient Method for Nonsmooth, Nonconvex Problems

    Full text link
    In this paper, we introduce a stochastic projected subgradient method for weakly convex (i.e., uniformly prox-regular) nonsmooth, nonconvex functions---a wide class of functions which includes the additive and convex composite classes. At a high-level, the method is an inexact proximal point iteration in which the strongly convex proximal subproblems are quickly solved with a specialized stochastic projected subgradient method. The primary contribution of this paper is a simple proof that the proposed algorithm converges at the same rate as the stochastic gradient method for smooth nonconvex problems. This result appears to be the first convergence rate analysis of a stochastic (or even deterministic) subgradient method for the class of weakly convex functions.Comment: Updated 9/17/2018: Major Revision -added high probability bounds, improved convergence analysis in general, new experimental results. Updated 7/26/2017: Added references to introduction and a couple simple extensions as Sections 3.2 and 4. Updated 8/23/2017: Added NSF acknowledgements. Updated 10/16/2017: Added experimental result

    Convergence rate analysis of several splitting schemes

    Full text link
    Splitting schemes are a class of powerful algorithms that solve complicated monotone inclusions and convex optimization problems that are built from many simpler pieces. They give rise to algorithms in which the simple pieces of the decomposition are processed individually. This leads to easily implementable and highly parallelizable algorithms, which often obtain nearly state-of-the-art performance. In the first part of this paper, we analyze the convergence rates of several general splitting algorithms and provide examples to prove the tightness of our results. The most general rates are proved for the \emph{fixed-point residual} (FPR) of the Krasnosel'ski\u{i}-Mann (KM) iteration of nonexpansive operators, where we improve the known big-OO rate to little-oo. We show the tightness of this result and improve it in several special cases. In the second part of this paper, we use the convergence rates derived for the KM iteration to analyze the \emph{objective error} convergence rates for the Douglas-Rachford (DRS), Peaceman-Rachford (PRS), and ADMM splitting algorithms under general convexity assumptions. We show, by way of example, that the rates obtained for these algorithms are tight in all cases and obtain the surprising statement: The DRS algorithm is nearly as fast as the proximal point algorithm (PPA) in the ergodic sense and nearly as slow as the subgradient method in the nonergodic sense. Finally, we provide several applications of our result to feasibility problems, model fitting, and distributed optimization. Our analysis is self-contained, and most results are deduced from a basic lemma that derives convergence rates for summable sequences, a simple diagram that decomposes each relaxed PRS iteration, and fundamental inequalities that relate the FPR to objective error.Comment: 45 pages; 3 figures; added convergence rate analysis of the inexact version of the KM algorith
    corecore